Search CORE

97 research outputs found

High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints

Author: Guo Yuchun
Mahony Shaun
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/03/2012
Field of study

An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial control

DSpace@MIT

STAMP: a web tool for exploring DNA-binding motif similarities

Author: Benos Panayiotis V.
Mahony Shaun
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

STAMP is a newly developed web server that is designed to support the study of DNA-binding motifs. STAMP may be used to query motifs against databases of known motifs; the software aligns input motifs against the chosen database (or alternatively against a user-provided dataset), and lists of the highest-scoring matches are returned. Such similarity-search functionality is expected to facilitate the identification of transcription factors that potentially interact with newly discovered motifs. STAMP also automatically builds multiple alignments, familial binding profiles and similarity trees when more than one motif is inputted. These functions are expected to enable evolutionary studies on sets of related motifs and fixed-order regulatory modules, as well as illustrating similarities and redundancies within the input motif collection. STAMP is a highly flexible alignment platform, allowing users to ‘mix-and-match’ between various implemented comparison metrics, alignment methods (local or global, gapped or ungapped), multiple alignment strategies and tree-building methods. Motifs may be inputted as frequency matrices (in many of the commonly used formats), consensus sequences, or alignments of known binding sites. STAMP also directly accepts the output files from 12 supported motif-finders, enabling quick interpretation of motif-discovery analyses. STAMP is available at http://www.benoslab.pitt.edu/stam

CiteSeerX

Crossref

PubMed Central

DNA Familial Binding Profiles Made Easy: Comparison of Various Motif Alignment and Clustering Strategies

Author: Gary Stormo
Panayiotis V Benos
Philip E Auron
Shaun Mahony
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Transcription factor (TF) proteins recognize a small number of DNA sequences with high specificity and control the expression of neighbouring genes. The evolution of TF binding preference has been the subject of a number of recent studies, in which generalized binding profiles have been introduced and used to improve the prediction of new target sites. Generalized profiles are generated by aligning and merging the individual profiles of related TFs. However, the distance metrics and alignment algorithms used to compare the binding profiles have not yet been fully explored or optimized. As a result, binding profiles depend on TF structural information and sometimes may ignore important distinctions between subfamilies. Prediction of the identity or the structural class of a protein that binds to a given DNA pattern will enhance the analysis of microarray and ChIP–chip data where frequently multiple putative targets of usually unknown TFs are predicted. Various comparison metrics and alignment algorithms are evaluated (a total of 105 combinations). We find that local alignments are generally better than global alignments at detecting eukaryotic DNA motif similarities, especially when combined with the sum of squared distances or Pearson's correlation coefficient comparison metrics. In addition, multiple-alignment strategies for binding profiles and tree-building methods are tested for their efficiency in constructing generalized binding models. A new method for automatic determination of the optimal number of clusters is developed and applied in the construction of a new set of familial binding profiles which improves upon TF classification accuracy. A software tool, STAMP, is developed to host all tested methods and make them publicly available. This work provides a high quality reference set of familial binding profiles and the first comprehensive platform for analysis of DNA profiles. Detecting similarities between DNA motifs is a key step in the comparative study of transcriptional regulation, and the work presented here will form the basis for tool and method development for future transcriptional modeling studies

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

Duquesne University: Digital Commons

Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models

Author: Golden Aaron
Mahony Shaun
McInerney James O.
Smith Terry J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Background: Many current gene prediction methods use only one model to represent proteincoding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation. Results: This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential. Conclusions: While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to geneprediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods

MURAL - Maynooth University Research Archive Library

Regulatory conservation of protein coding and microRNA genes in vertebrates: lessons from the opossum genome

Author: David L Corcoran
Eleanor Feingold
Panayiotis V Benos
Shaun Mahony
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

BACKGROUND: Being the first noneutherian mammal sequenced, Monodelphis domestica (opossum) offers great potential for enhancing our understanding of the evolutionary processes that take place in mammals. This study focuses on the evolutionary relationships between conservation of noncoding sequences, cis-regulatory elements, and biologic functions of regulated genes in opossum and eight vertebrate species. RESULTS: Analysis of 145 intergenic microRNA and all protein coding genes revealed that the upstream sequences of the former are up to twice as conserved as the latter among mammals, except in the first 500 base pairs, where the conservation is similar. Comparison of promoter conservation in 513 protein coding genes and related transcription factor binding sites (TFBSs) showed that 41% of the known human TFBSs are located in the 6.7% of promoter regions that are conserved between human and opossum. Some core biologic processes exhibited significantly fewer conserved TFBSs in human-opossum comparisons, suggesting greater functional divergence. A new measure of efficiency in multigenome phylogenetic footprinting (base regulatory potential rate [BRPR]) shows that including human-opossum conservation increases specificity in finding human TFBSs. CONCLUSION: Opossum facilitates better estimation of promoter conservation and TFBS turnover among mammals. The fact that substantial TFBS numbers are located in a small proportion of the human-opossum conserved sequences emphasizes the importance of marsupial genomes for phylogenetic footprinting-based motif discovery strategies. The BRPR measure is expected to help select genome combinations for optimal performance of these algorithms. Finally, although the etiology of the microRNA upstream increased conservation remains unknown, it is expected to have strong implications for our understanding of regulation of their expression

Crossref

PubMed Central

Recommended from our members

Transcription factor binding site identification using the Self-Organizing Map

Author: Golden Aaron
Hendrix David
Mahony Shaun
Rokhsar Daniel S.
Smith Terry J.
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

MOTIVATION: The automatic identification of over-represented motifs present in a collection of sequences continues to be a challenging problem in computational biology. Many existing approaches to motif identification do not always find the relevant biological motifs, or find only a subset of the occurrences of a motif. In this paper, we propose a self-organizing map of position weight matrices as an alternative method for motif discovery. The advantage of this approach is that it can be used to simultaneously characterize every feature present in the data set, thus lessening the chance that weaker signals will be missed. Features identified are ranked in terms of over-representation relative to a background model. RESULTS: We present an implementation of this approach, named SOMBRERO, which is capable of discovering multiple distinct motifs present in a single data set. Demonstrated here are the advantages of our approach on various data sets and SOMBRERO’s improved performance over two popular motif-finding programs; MEME and AlignACE. AVAILABILITY: SOMBRERO is available free of charge from http://bioinf.nuigalway.ie/sombrero

ScholarsArchive@OSU

A multi-parametric flow cytometric assay to analyze DNA–protein interactions

Author: Ana Arbab
David K. Gifford
Er Rolfe
Hyunjii Cho
Joel M. Chick
John Peter Van Hoff
P. Alex
Richard I. Sherwood
Richard L. Maas
Shaun Mahony
Steven P. Gygi
Viveca W. S. Morris
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Interactions between DNA and transcription factors (TFs) guide cellular function and development, yet the complexities of gene regulation are still far from being understood. Such understanding is limited by a paucity of techniques with which to probe DNA–protein interactions. We have devised magnetic protein immobilization on enhancer DNA (MagPIE), a simple, rapid, multi-parametric assay using flow cytometric immunofluorescence to reveal interactions among TFs, chromatin structure and DNA. In MagPIE, synthesized DNA is bound to magnetic beads, which are then incubated with nuclear lysate, permitting sequence-specific binding by TFs, histones and methylation by native lysate factors that can be optionally inhibited with small molecules. Lysate protein–DNA binding is monitored by flow cytometric immunofluorescence, which allows for accurate comparative measurement of TF-DNA affinity. Combinatorial fluorescent staining allows simultaneous analysis of sequence-specific TF-DNA interaction and chromatin modification. MagPIE provides a simple and robust method to analyze complex epigenetic interactions in vitro

CiteSeerX

DSpace@MIT

PubMed Central

A Cdx4-Sall4 Regulatory Module Controls the Transition from Mesoderm Formation to Embryonic Hematopoiesis

Author: Davidson Alan J.
DiBiase Anthony
Dorjsuren Bilguujin
Gifford David
Mahony Shaun
Mosimann Christian
Paik Elizabeth J.
Price Emily N.
White Richard M.
Zon Leonard I.
Publication venue: 'Elsevier BV'
Publication date: 01/10/2013
Field of study

Summary Deletion of caudal/cdx genes alters hox gene expression and causes defects in posterior tissues and hematopoiesis. Yet, the defects in hox gene expression only partially explain these phenotypes. To gain deeper insight into Cdx4 function, we performed chromatin immunoprecipitation sequencing (ChIP-seq) combined with gene-expression profiling in zebrafish, and identified the transcription factor spalt-like 4 (sall4) as a Cdx4 target. ChIP-seq revealed that Sall4 bound to its own gene locus and the cdx4 locus. Expression profiling showed that Cdx4 and Sall4 coregulate genes that initiate hematopoiesis, such as hox, scl, and lmo2. Combined cdx4/sall4 gene knockdown impaired erythropoiesis, and overexpression of the Cdx4 and Sall4 target genes scl and lmo2 together rescued the erythroid program. These findings suggest that auto- and cross-regulation of Cdx4 and Sall4 establish a stable molecular circuit in the mesoderm that facilitates the activation of the blood-specific program as development proceeds

DSpace@MIT

Elsevier - Publisher Connector

Harvard University - DASH

PubMed Central

An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding

Author: A Arvey
A Marson
A Meissner
AC Mullen
AK Tewari
Akshay Kakumanu
B Langmead
C Taslim
Carolyn A. Morrison
D Strumpf
David K. Gifford
E Redhead
EO Mazzoni
EO Mazzoni
EO Mazzoni
Esteban O. Mazzoni
H Ji
H Niwa
H Xu
HS Rhee
Hynek Wichterle
Ilya Ioshikhes
J-CD Heng
JA Granek
JA Stamatoyannopoulos
JP Ferguson
K Liang
KS Zaret
M Berger
M Ku
MAT Figueiredo
Matthew D. Edwards
MD Robinson
MH Kagey
MP Creyghton
P Huggins
PB Rahl
R Jothi
RI Sherwood
Richard I. Sherwood
S John
S Mahony
S Mahony
SG Landt
Shaun Mahony
TL Bailey
TS Mikkelsen
X Chen
X Zeng
Y Guo
Y Guo
Y Zhang
Z Shao
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2013
Field of study

Regulatory proteins can bind to different sets of genomic targets in various cell types or conditions. To reliably characterize such condition-specific regulatory binding we introduce MultiGPS, an integrated machine learning approach for the analysis of multiple related ChIP-seq experiments. MultiGPS is based on a generalized Expectation Maximization framework that shares information across multiple experiments for binding event discovery. We demonstrate that our framework enables the simultaneous modeling of sparse condition-specific binding changes, sequence dependence, and replicate-specific noise sources. MultiGPS encourages consistency in reported binding event locations across multiple-condition ChIP-seq datasets and provides accurate estimation of ChIP enrichment levels at each event. MultiGPS's multi-experiment modeling approach thus provides a reliable platform for detecting differential binding enrichment across experimental conditions. We demonstrate the advantages of MultiGPS with an analysis of Cdx2 binding in three distinct developmental contexts. By accurately characterizing condition-specific Cdx2 binding, MultiGPS enables novel insight into the mechanistic basis of Cdx2 site selectivity. Specifically, the condition-specific Cdx2 sites characterized by MultiGPS are highly associated with pre-existing genomic context, suggesting that such sites are pre-determined by cell-specific regulatory architecture. However, MultiGPS-defined condition-independent sites are not predicted by pre-existing regulatory signals, suggesting that Cdx2 can bind to a subset of locations regardless of genomic environment. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant 0645960)National Institutes of Health (U.S.) (grant P01 NS055923)Pennsylvania State University. Center for Eukaryotic Gene Regulatio

CiteSeerX

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Harvard University - DASH

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central